Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Triplet deep hashing method for speech retrieval
Qiuyu ZHANG, Yongwang WEN
Journal of Computer Applications    2023, 43 (9): 2910-2918.   DOI: 10.11772/j.issn.1001-9081.2022081149
Abstract201)   HTML6)    PDF (2003KB)(69)       Save

The existing deep hashing methods of content-based speech retrieval do not make enough use of supervised information and have the suboptimal generated hash codes, low retrieval precision and low retrieval efficiency. To address the above problems, a triplet deep hashing method for speech retrieval was proposed. Firstly, the spectrogram image features were used as the input of the model in triplet manner to extract the effective information of the speech feature. Then, an Attentional mechanism-Residual Network (ARN) model was proposed, that is, the spatial attention mechanism was embedded on the basis of the ResNet (Residual Network), and the salient region representation was improved by aggregating the energy salient region information in the whole spectrogram. Finally, a novel triplet cross-entropy loss was introduced to map the classification information and similarity between spectrogram image features into the learned hash codes, thereby achieving the maximum class separability and maximal hash code discriminability during model training. Experimental results show that the efficient and compact binary hash codes generated by the proposed method has the recall, precision and F1 score of over 98.5% in speech retrieval. Compared with methods such as single-label retrieval method, the average running time of the proposed method using Log-Mel spectra as features is shorted by 19.0% to 55.5%. Therefore, this method can improve the retrieval efficiency and retrieval precision significantly while reducing the amount of computation.

Table and Figures | Reference | Related Articles | Metrics